Introduction

The purpose of this tutorial is to use our edgelist data to analyse neuronal connectivity. An edgelist is a data frame of neuron-to-neuron connections, which describes a directed, weighted graph.

Edgelist and Graph Representations

To do this, we need to think about directional connections between upstream presynaptic neurons (pre) and downstream postsynaptic ones (post).

We will represent the strength, i.e. “weight”, of these connections in two ways:

  1. Count (count): The number of chemical synaptic contacts between two neurons
  2. Normalized weight (norm): The count normalized by the total number of inputs that a target neuron (i.e., post) receives

Let’s start by choosing a dataset and a subset. By default, this notebook will look at BANC and its feeding and mechanosensation circuits of the suboesophageal zone (SEZ). It’s basically the lower part of the brain.

This zone is a relatively under-explored part of the connectome, made of several neuropils: GNG, FLA, SAD, PRW and AMMC.

You can instead choose to work with the full dataset, or a different subset.

Currently working with:

  • Dataset: banc_746
  • Subset: suboesophageal_zone
  • Data location: gs://sjcabs_2025_data (Google Cloud Storage)

Core tutorial

Visualizing the Suboesophageal Zone Neuropils

The suboesophageal zone contains several distinct neuropils. Let’s visualize their 3D structures to understand their spatial organization.

Note on 3D mesh organization: - Large anatomical regions (VNC, brain, etc.): obj/ directory - Smaller specific neuropils (GNG, FLA, AMMC, etc.): obj/neuropils/ subdirectory

# Extract base dataset name
dataset_base <- sub("_[0-9]+$", "", dataset)

# Download and read meshes as hxsurf objects for nat+plotly
read_neuropil <- function(search, data_path, dataset_base, ext="neuropils"){
  neuropil_path <- file.path(data_path, dataset_base, "obj", ext)
  neuropil_files <- suppressWarnings(system(paste("gsutil ls", neuropil_path), intern = TRUE))
  objs <- neuropil_files[grepl(search, basename(neuropil_files))]

  if (length(objs) == 0) {
    cat("  No files found matching:", search, "\n")
    return(NULL)
  }

  # Read and convert the first matching file (typically only one per search term)
  obj_file <- objs[1]
  temp_file <- tempfile(fileext = ".obj")
  system(paste("gsutil cp", obj_file, temp_file))
  mesh <- as.hxsurf(rgl::readOBJ(temp_file))
  return(mesh)
}

# Read brain mesh (large region)
brain_mesh <- read_neuropil("brain", data_path, dataset_base, ext = "")

# Read SEZ neuropils (smaller specific regions)
gng_mesh <- read_neuropil("GNG", data_path, dataset_base)
fla_l_mesh <- read_neuropil("FLA_L", data_path, dataset_base)
fla_r_mesh <- read_neuropil("FLA_R", data_path, dataset_base)
sad_mesh <- read_neuropil("SAD", data_path, dataset_base)
prw_mesh <- read_neuropil("PRW", data_path, dataset_base)
ammc_l_mesh <- read_neuropil("AMMC_L", data_path, dataset_base)
ammc_r_mesh <- read_neuropil("AMMC_R", data_path, dataset_base)

# Plot all meshes in 3D with nat+plotly
nclear3d()

# Plot brain as context (very transparent)
dummy<-plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)

# Plot SEZ neuropils with distinct colors
dummy1<-plot3d(gng_mesh, col = "red", alpha = 0.5, add = TRUE)
dummy2<-plot3d(fla_l_mesh, col = "blue", alpha = 0.5, add = TRUE)
dummy3<-plot3d(fla_r_mesh, col = "lightblue", alpha = 0.5, add = TRUE)
dummy4<-plot3d(sad_mesh, col = "green", alpha = 0.5, add = TRUE)
dummy5<-plot3d(prw_mesh, col = "purple", alpha = 0.5, add = TRUE)
dummy6<-plot3d(ammc_l_mesh, col = "orange", alpha = 0.5, add = TRUE)
plot3d(ammc_r_mesh, col = "lightsalmon", alpha = 0.5, add = TRUE)

Load Data

We will choose our dataset and optionally a pre-prepared subset. By defauly, let’s look at neurons of the SEZ.

# Extract base dataset name (e.g., "banc" from "banc_746")
dataset_base <- sub("_[0-9]+$", "", dataset)

# Construct paths
if (!is.null(subset_name)) {
  # Use subset data
  subset_dir <- file.path(data_path, dataset_base, subset_name)
  edgelist_path <- file.path(subset_dir,
                             paste0(dataset, "_", subset_name, "_simple_edgelist.feather"))

  cat("Using subset:", subset_name, "\n")
} else {
  # Use full dataset
  edgelist_path <- construct_path(data_path, dataset, "edgelist_simple")

  cat("Using full dataset\n")
}
## Using subset: suboesophageal_zone
# Always need full meta data
meta_path <- construct_path(data_path, dataset, "meta")

Now read in the chosen edgelist:

# Read edgelist
edgelist <- read_feather_smart(edgelist_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/banc/suboesophageal_zone/banc_746_suboesophageal_zone_simple_edgelist.feather 
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 15.21 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 500247 rows
# Display first few rows
head(edgelist)
## # A tibble: 6 × 5
##   pre                post               count    norm total_input
##   <chr>              <chr>              <int>   <dbl>       <int>
## 1 720575941603036601 720575941535142488    72 0.0124         5799
## 2 720575941555376356 720575941554573313     1 0.00261         383
## 3 720575941391131286 720575941559112807     6 0.0366          164
## 4 720575941436164384 720575941461382413     1 0.00331         302
## 5 720575941433324046 720575941595590951    37 0.00770        4804
## 6 720575941540575869 720575941567367412     2 0.00552         362

And get our meta data, subsetted by neurons present in the edgelist (pre + post):

# Read full meta data
meta_full <- read_feather_smart(meta_path, gcs_filesystem = gcs_fs)
## Reading from GCS: gs://sjcabs_2025_data/banc/banc_746_meta.feather 
## Downloading from GCS... (this may take several minutes for large files)
## Downloaded 9.84 MB from GCS
## Loading into memory with Arrow...
## ✓ Done! Loaded 168791 rows
# Get unique neuron IDs from edgelist
neuron_ids <- unique(c(edgelist$pre, edgelist$post))

# Subset meta data to neurons in edgelist
meta <- meta_full %>%
  filter(!!sym(dataset_id) %in% neuron_ids)

# Display first few rows
head(meta)
## # A tibble: 6 × 18
##   banc_746_id     supervoxel_id region side  hemilineage nerve flow  super_class
##   <chr>           <chr>         <chr>  <chr> <chr>       <chr> <chr> <chr>      
## 1 72057594149663… 762110688716… centr… right putative_p… <NA>  intr… visual_cen…
## 2 72057594152782… 757176625587… centr… right putative_p… <NA>  intr… central_br…
## 3 72057594144288… 755769938568… centr… right putative_p… <NA>  intr… central_br…
## 4 72057594165886… 764922683662… centr… left  putative_p… <NA>  intr… central_br…
## 5 72057594157762… 755066251124… centr… right putative_p… <NA>  intr… central_br…
## 6 72057594155576… 757885810585… centr… left  putative_p… <NA>  intr… central_br…
## # ℹ 10 more variables: cell_class <chr>, cell_sub_class <chr>, cell_type <chr>,
## #   neurotransmitter_predicted <chr>, neurotransmitter_score <dbl>,
## #   cell_function <chr>, cell_function_detailed <chr>, body_part_sensory <chr>,
## #   body_part_effector <chr>, status <chr>
cat("Meta data for", nrow(meta), "neurons\n")
## Meta data for 6188 neurons
cat("Unique pre neurons:", length(unique(edgelist$pre)), "\n")
## Unique pre neurons: 6186
cat("Unique post neurons:", length(unique(edgelist$post)), "\n")
## Unique post neurons: 6187

Meta Data Overview

Let’s get our bearings and have a look at the meta data for our chosen edgelist.

Let’s see what super_class and cell_class categories we have:

# Count by super_class
super_class_counts <- meta %>%
  filter(!is.na(super_class)) %>%
  count(super_class) %>%
  arrange(desc(n)) %>%
  as.data.frame() %>%
  mutate(super_class=as.character(super_class))

# Reorder for plotting (set factor levels explicitly for ggplotly compatibility)
super_class_counts$super_class <- factor(
  super_class_counts$super_class,
  levels = super_class_counts$super_class[order(super_class_counts$n)]
)

# Create subtitle for plotly compatibility
plot_subtitle <- if(!is.null(subset_name)) paste("Subset:", subset_name) else "Full dataset"

# Plot (swap x and y, no coord_flip for ggplotly compatibility)
p_super <- ggplot(super_class_counts, aes(y = super_class, x = n)) +
  geom_col(fill = "steelblue", alpha = 0.8) +
  #geom_text(aes(label = n), hjust = -0.2, size = 3) +
  labs(
    title = paste("Neuron Super Classes:", dataset),
    subtitle = plot_subtitle,
    y = "Super Class",
    x = "Number of Neurons"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text = element_text(size = 10)
  )

save_plot(p_super, paste0(dataset, "_super_class"))
ggplotly(p_super)

If we have sensory (afferent) or effector (efferent) neurons, let’s see what cell_class we have for each:

# Count by flow and cell_class
flow_subclass <- meta %>%
  filter(flow %in% c("afferent", "efferent"),
         !is.na(cell_class)) %>%
  count(flow, cell_class) %>%
  arrange(flow, desc(n)) %>%
  group_by(flow) %>%
  slice_head(n = 15) %>%  # Top 15 per flow
  ungroup()

if (nrow(flow_subclass) > 0) {
  # Reorder for plotting within each flow group
  flow_subclass <- flow_subclass %>%
    group_by(flow) %>%
    arrange(n) %>%
    mutate(cell_class = factor(cell_class, levels = unique(cell_class))) %>%
    ungroup()

  p_flow <- ggplot(flow_subclass,
                   aes(y = cell_class, x = n, fill = flow)) +
    geom_col(alpha = 0.8) +
    geom_text(aes(label = n), hjust = -0.2, size = 3) +
    facet_wrap(~flow, scales = "free_y", ncol = 1) +
    scale_fill_manual(values = c("afferent" = "#E69F00", "efferent" = "#56B4E9")) +
    labs(
      title = "Sensory (Afferent) and Effector (Efferent) Neurons",
      subtitle = "Top 15 cell sub-classes per flow type",
      x = "Cell Sub-Class",
      y = "Number of Neurons"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14),
      legend.position = "none",
      strip.text = element_text(face = "bold", size = 11)
    )

  save_plot(p_flow, paste0(dataset, "_flow_subclass"), height = 10)
  ggplotly(p_flow)
}

Neurotransmitter Prediction and Connectivity Signs

To understand whether connections are excitatory or inhibitory, we can use predicted neurotransmitter information. The meta data contains neurotransmitter_predicted and neurotransmitter_score for each neuron.

We’ll first compute a consensus neurotransmitter for each cell type by taking a weighted mean based on prediction scores. Then we’ll assign signs to connections: - Excitatory (sign: +1): acetylcholine, dopamine - Inhibitory (sign: -1): glutamate, GABA, histamine, serotonin, octopamine

This allows us to create signed connectivity weights (signed_norm) that capture both connection strength and likely sign.

# Compute consensus neurotransmitter for each cell_type
celltype_nt <- meta %>%
  filter(!is.na(cell_type), !is.na(neurotransmitter_predicted)) %>%
  group_by(cell_type, neurotransmitter_predicted) %>%
  summarise(
    mean_score = mean(neurotransmitter_score, na.rm = TRUE),
    n_neurons = n(),
    .groups = "drop"
  ) %>%
  group_by(cell_type) %>%
  slice_max(mean_score, n = 1, with_ties = FALSE) %>%
  select(cell_type, consensus_nt = neurotransmitter_predicted, nt_score = mean_score) %>%
  ungroup()

# Assign signs based on neurotransmitter
assign_sign <- function(nt) {
  case_when(
    tolower(nt) %in% c("acetylcholine", "dopamine") ~ 1,
    tolower(nt) %in% c("glutamate", "gaba", "histamine", "serotonin", "octopamine") ~ -1,
    TRUE ~ NA_real_
  )
}

celltype_nt <- celltype_nt %>%
  mutate(sign = assign_sign(consensus_nt))

cat("Cell types with neurotransmitter predictions:", nrow(celltype_nt), "\n")
## Cell types with neurotransmitter predictions: 2488
cat("Excitatory cell types:", sum(celltype_nt$sign == 1, na.rm = TRUE), "\n")
## Excitatory cell types: 1382
cat("Inhibitory cell types:", sum(celltype_nt$sign == -1, na.rm = TRUE), "\n")
## Inhibitory cell types: 1106

Now we’ll add the signed_norm column to our edgelist. For each connection, we multiply the normalized weight by the sign of the presynaptic neuron’s neurotransmitter:

# Add neurotransmitter info to edgelist via meta
edgelist <- edgelist %>%
  left_join(
    meta %>% select(id = !!sym(dataset_id),
                   pre_cell_type = cell_type),
    by = c("pre" = "id")
  ) %>%
  left_join(
    celltype_nt %>% select(cell_type, pre_sign = sign),
    by = c("pre_cell_type" = "cell_type")
  ) %>%
  mutate(
    # Use cell_type sign, default to excitatory if unknown
    pre_sign = coalesce(pre_sign, 1),
    signed_norm = norm * pre_sign
  ) %>%
  select(-pre_cell_type, -pre_sign)

cat("Edgelist now includes signed_norm column\n")
## Edgelist now includes signed_norm column
cat("Positive connections (excitatory):", sum(edgelist$signed_norm > 0, na.rm = TRUE), "\n")
## Positive connections (excitatory): 263419
cat("Negative connections (inhibitory):", sum(edgelist$signed_norm < 0, na.rm = TRUE), "\n")
## Negative connections (inhibitory): 236828

Here are some normalised density plots, comparing the distribution of inhibitory and excitatory neuron-neuron connections by synaptic strengths.

# Classify connections by sign
edgelist_signed <- edgelist %>%
  mutate(
    connection_type = case_when(
      signed_norm > 0 ~ "Excitatory",
      signed_norm < 0 ~ "Inhibitory",
      TRUE ~ "Unknown"
    )
  ) %>%
  filter(connection_type != "Unknown")

# Create density plot of absolute weights by connection type
p_signed_dist <- ggplot(edgelist_signed,
                        aes(x = abs(signed_norm),
                            color = connection_type,
                            fill = connection_type)) +
  geom_density(alpha = 0.3, linewidth = 1) +
  scale_x_log10() +
  scale_color_manual(
    values = c("Excitatory" = "red", "Inhibitory" = "blue"),
    name = "Connection Type"
  ) +
  scale_fill_manual(
    values = c("Excitatory" = "red", "Inhibitory" = "blue"),
    name = "Connection Type"
  ) +
  labs(
    title = "Distribution of Connection Strengths by Sign",
    subtitle = "Based on predicted neurotransmitter type",
    x = "Normalized Weight (absolute, log scale)",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

save_plot(p_signed_dist, paste0(dataset, "_signed_weight_distribution"))
ggplotly(p_signed_dist)
# Print summary statistics
edgelist_signed %>%
  group_by(connection_type) %>%
  summarise(
    mean_weight = mean(abs(signed_norm), na.rm = TRUE),
    median_weight = median(abs(signed_norm), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  print()
## # A tibble: 2 × 3
##   connection_type mean_weight median_weight
##   <chr>                 <dbl>         <dbl>
## 1 Excitatory          0.00496       0.00192
## 2 Inhibitory          0.00730       0.00223
cat("Excitatory connections:", sum(edgelist_signed$connection_type == "Excitatory"), "\n")
## Excitatory connections: 263419
cat("Inhibitory connections:", sum(edgelist_signed$connection_type == "Inhibitory"), "\n")
## Inhibitory connections: 236828

However, it is very important not to overplay this idea of being able to assign signs to connections. In the fly, glutamate can be excitatory or inhibitory. In addition, inhibition doesn’t only quell activity, it can cause it, e.g. through disinhibition in networks. Consider the visual system, all inputting sensory neurons, the photoreceptor neurons, are histaminergic and inhibit their targets.

It is generally better to work with unsigned edgelists and methods. The exception is when examining direct connectivity, or building concise circuit diagrams.

Basic Network Statistics

Next, we can get a basic sense of the graph. We can do this using the library igraph.

We can see a scatter plot of both count and norm, and observe their correlation:

# Sample if too many points
if (nrow(edgelist) > 50000) {
  edgelist_sample <- edgelist %>% sample_n(50000)
} else {
  edgelist_sample <- edgelist
}

# Create scatter plot with marginal histograms
p_scatter <- ggplot(edgelist_sample, aes(x = count, y = norm)) +
  geom_point(alpha = 0.3, color = "steelblue", size = 1) +
  geom_smooth(method = "lm", color = "red", se = TRUE, alpha = 0.2) +
  scale_x_log10() +
  scale_y_log10() +
  labs(
    title = "Relationship between Synapse Count and Normalized Weight",
    x = "Synapse Count (log scale)",
    y = "Normalized Weight (log scale)",
    subtitle = sprintf("Correlation: %.3f (Spearman)",
                      cor(edgelist$count, edgelist$norm, method = "spearman"))
  ) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

save_plot(p_scatter, paste0(dataset, "_weight_correlation"))
p_scatter

We can see density plots of input and output degrees with different synapse count thresholds:

# Calculate in-degree and out-degree with two thresholds
# Threshold 1: count > 1
degree_threshold1 <- bind_rows(
  edgelist %>% filter(count > 1) %>% count(post, name = "degree") %>%
    mutate(type = "In-degree", threshold = ">1 synapse"),
  edgelist %>% filter(count > 1) %>% count(pre, name = "degree") %>%
    mutate(type = "Out-degree", threshold = ">1 synapse")
)

# Threshold 2: count > 10
degree_threshold10 <- bind_rows(
  edgelist %>% filter(count > 10) %>% count(post, name = "degree") %>%
    mutate(type = "In-degree", threshold = ">10 synapses"),
  edgelist %>% filter(count > 10) %>% count(pre, name = "degree") %>%
    mutate(type = "Out-degree", threshold = ">10 synapses")
)

# Combine
degree_data <- bind_rows(degree_threshold1, degree_threshold10) %>%
  mutate(
    threshold = factor(threshold, levels = c(">1 synapse", ">10 synapses"))
  )

# Plot normalized density
p_degree <- ggplot(degree_data, aes(x = degree, color = type, linetype = threshold)) +
  geom_density(alpha = 0.3, linewidth = 1) +
  scale_color_manual(
    values = c("In-degree" = "#E69F00", "Out-degree" = "#56B4E9"),
    name = "Direction"
  ) +
  scale_linetype_manual(
    values = c(">1 synapse" = "solid", ">10 synapses" = "dashed"),
    name = "Threshold"
  ) +
  scale_x_log10() +
  labs(
    title = "Degree Distribution by Synapse Count Threshold",
    subtitle = "Normalized density plots",
    x = "Degree (log scale)",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )
save_plot(p_degree, paste0(dataset, "_degree_distribution"))
ggplotly(p_degree)

Visualizing Direct Connectivity

We can use a network graph plot to look at connectivity when we only have a few nodes.

Since we have many neurons, we will first collapse our edgelist by super_class by joining with meta. We will remove cases where we have a super_class of NA:

# Collapse by super_class and remove self-connections
edgelist_super <- edgelist %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           pre_super_class = super_class),
           by = c("pre" = "id")) %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           post_super_class = super_class),
           by = c("post" = "id")) %>%
  filter(!is.na(pre_super_class), !is.na(post_super_class),
         pre_super_class != post_super_class) %>%  # Remove self-connections
  group_by(pre_super_class, post_super_class) %>%
  summarise(
    synapse_count = sum(count, na.rm = TRUE),
    weight = sum(count, na.rm = TRUE),
    n_connections = n(),
    .groups = "drop"
  ) %>%
  filter(synapse_count >= 50)  # Keep substantial connections (by synapse count)

# Create vertices
vertices_super <- data.frame(
  name = unique(c(edgelist_super$pre_super_class,
                 edgelist_super$post_super_class))
) %>%
  left_join(meta %>%
             count(super_class) %>%
             rename(name = super_class, size = n),
           by = "name")

# Create graph
g_super <- graph_from_data_frame(
  d = edgelist_super %>% select(pre_super_class, post_super_class,
                                weight, synapse_count),
  vertices = vertices_super,
  directed = TRUE
)

# Convert to tidygraph and add node attributes
g_super_tidy <- as_tbl_graph(g_super) %>%
  activate(nodes) %>%
  mutate(
    degree = centrality_degree(mode = "total"),
    super_class = name  # Add super_class column for coloring
  )

# Create layout with increased repulsion
layout_super <- create_layout(g_super_tidy, layout = "fr", niter = 1000)

# Create subtitle for plotly compatibility
network_subtitle <- paste(dataset, "-", if(!is.null(subset_name)) subset_name else "Full dataset")

# Plot with colors by super_class
p_network <- ggraph(layout_super) +
  geom_edge_arc(
    aes(width = weight, alpha = weight, color = node1.super_class),
    arrow = arrow(length = unit(3, 'mm'), type = "closed"),
    start_cap = circle(5, 'mm'),
    end_cap = circle(5, 'mm'),
    strength = 0.3
  ) +
  geom_node_point(aes(size = degree, color = super_class), alpha = 0.8) +
  geom_node_text(aes(label = name), repel = TRUE, size = 3, fontface = "bold") +
  scale_edge_width(range = c(0.2, 2), name = "Normalized\nWeight") +
  scale_edge_alpha(range = c(0.5, 1.0), name = "Normalized\nWeight") +
  scale_size_continuous(range = c(3, 10), name = "Degree") +
  scale_color_discrete(name = "Super Class") +
  scale_edge_color_discrete(name = "Source\nSuper Class") +
  labs(
    title = "Connectivity Network by Super Class",
    subtitle = network_subtitle
  ) +
  theme_graph() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14)
  )

save_plot(p_network, paste0(dataset, "_network_super_class"), width = 12, height = 10)
p_network

Connectivity Matrix

One good way to look at connectivity directly is to visualize a connectivity matrix.

We will put pre on the rows and post on the columns.

Since we have many neurons, we will first collapse our edgelist by cell_type:

# Collapse by cell_type
edgelist_celltype_raw <- edgelist %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           pre_cell_type = cell_type,
                           pre_flow = flow),
           by = c("pre" = "id")) %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           post_cell_type = cell_type,
                           post_flow = flow),
           by = c("post" = "id")) %>%
  filter(!is.na(pre_cell_type), !is.na(post_cell_type))

# Calculate total inputs per post_cell_type for normalization
post_totals <- edgelist_celltype_raw %>%
  group_by(post_cell_type) %>%
  summarise(post_total_count = sum(count, na.rm = TRUE), .groups = "drop")

# Aggregate by cell type and recalculate norm
edgelist_celltype <- edgelist_celltype_raw %>%
  group_by(pre_cell_type, post_cell_type, pre_flow, post_flow) %>%
  summarise(
    total_count = sum(count, na.rm = TRUE),
    # Get the modal sign for this cell type pair (most common sign)
    modal_sign = ifelse(sum(signed_norm > 0, na.rm = TRUE) > sum(signed_norm < 0, na.rm = TRUE), 1, -1),
    .groups = "drop"
  ) %>%
  left_join(post_totals, by = "post_cell_type") %>%
  mutate(
    norm = total_count / post_total_count,
    # Use modal sign to create signed_norm (preserves magnitude)
    signed_norm = norm * modal_sign
  ) %>%
  select(-post_total_count, -modal_sign) %>%
  filter(total_count>=10)
head(edgelist_celltype)
## # A tibble: 6 × 7
##   pre_cell_type post_cell_type pre_flow post_flow total_count   norm signed_norm
##   <chr>         <chr>          <chr>    <chr>           <int>  <dbl>       <dbl>
## 1 AL-AST1       ALIN4          intrins… intrinsic          12 0.0123      0.0123
## 2 AL-AST1       ALIN7          intrins… intrinsic          33 0.0789      0.0789
## 3 AL-AST1       AN01A055       intrins… intrinsic          56 0.0177      0.0177
## 4 AL-AST1       CB0065         intrins… intrinsic          23 0.0211      0.0211
## 5 AL-AST1       CB0083         intrins… intrinsic          21 0.0184      0.0184
## 6 AL-AST1       CB0182         intrins… intrinsic          56 0.0553      0.0553

Next, we can choose the strongest cell type-to-cell type connections to visualize, i.e., those above the 95th percentile:

# Calculate threshold
threshold_95 <- quantile(edgelist_celltype$total_count, 0.95)

# Filter for strong connections
edgelist_strong <- edgelist_celltype %>%
  filter(total_count >= threshold_95)

cat("Connections above 95th percentile (>", round(threshold_95), "synapses):",
    nrow(edgelist_strong), "\n")
## Connections above 95th percentile (> 121 synapses): 2019

And then we can plot our connectivity matrix. We are looking at very many neurons, so let’s also make things interesting by just looking at cell type upstream of the motor neurons that control the probosics.

# Propobscis motor cell types
proboscis_motor_cts <- meta %>%
  filter(grepl("proboscis",body_part_effector), 
         !is.na(cell_type)) %>%
  pull(cell_type)

# Prepare data for heatmap (aggregate first in case of duplicates)
conn_heatmap_data <- edgelist_celltype %>%
  filter(pre_cell_type %in% proboscis_motor_cts | post_cell_type %in% proboscis_motor_cts) %>%
  group_by(pre_cell_type, post_cell_type) %>%
  summarise(
    signed_norm = mean(signed_norm, na.rm = TRUE),
    .groups = "drop"
  )

# Create matrix for clustering (use absolute values for clustering)
conn_matrix <- conn_heatmap_data %>%
  tidyr::pivot_wider(
    names_from = post_cell_type,
    values_from = signed_norm,
    values_fill = 0
  ) %>%
  tibble::column_to_rownames("pre_cell_type") %>%
  as.matrix()
conn_matrix[is.na(conn_matrix)] <- 0
conn_matrix[is.infinite(conn_matrix)] <- 0

# Cap color scale at 25th and 75th percentiles for better visibility
p10 <- quantile(conn_matrix, 0.25, na.rm = TRUE)
p90 <- quantile(conn_matrix, 0.75, na.rm = TRUE)
max_abs_val <- max(abs(c(p10, p90)))

# Ensure max_abs_val is not zero (which would cause non-unique breaks)
if (max_abs_val == 0 || is.na(max_abs_val)) {
  # Fall back to using the actual range of the data
  max_abs_val <- max(abs(conn_matrix), na.rm = TRUE)
  if (max_abs_val == 0 || is.na(max_abs_val)) {
    max_abs_val <- 1  # Default fallback
  }
}

# Create static heatmap with pheatmap
pheatmap(
  conn_matrix,
  clustering_method = "ward.D2",
  clustering_distance_rows = "euclidean",
  clustering_distance_cols = "euclidean",
  color = colorRampPalette(c("blue", "white", "red"))(256),
  breaks = seq(-max_abs_val, max_abs_val, length.out = 257),
  show_rownames = TRUE,
  show_colnames = TRUE,
  main = "Signed Connectivity Matrix: Strong Connections",
  filename = file.path(img_dir, paste0(dataset, "_conn_matrix_strong.png")),
  width = 10,
  height = 16
)

# Create interactive heatmap with Ward's clustering (no dendrograms shown)
p_matrix <- heatmaply(
  conn_matrix,
  dendrogram = "none",
  hclust_method = "ward.D2",
  dist_method = function(x) dist(abs(x)),
  colors = colorRampPalette(c("blue", "cyan", "white", "yellow", "red"))(256),
  limits = c(-max_abs_val, max_abs_val),
  main = "Signed Connectivity Matrix: Strong Connections (>95th percentile)",
  xlab = "Postsynaptic Cell Type",
  ylab = "Presynaptic Cell Type",
  showticklabels = c(FALSE, FALSE),
  hide_colorbar = FALSE,
  fontsize_row = 6,
  fontsize_col = 6,
  key.title = "Signed\nWeight"
)

p_matrix

In general, one thing I find useful is to look at sensory neurons (flow == "afferent") and effector (i.e., motor or endocrine) neurons (flow == "efferent"), because they are quite interpretable.

For these neurons, cell_class and/or cell_sub_class is often the most useful label.

For sensory and motor neurons, this label can tell us about innervation of exterior body parts.

Fly body parts and sensory structures

As well as internal ones.

Fly body parts detailed view

Let’s re-collapse our edgelist, but by cell_class for sensory/effector neurons, and cell_type for everything else.

Let’s take every cell type with at least 100 connections from a sensory neuron, and visualize this as a heatmap:

# Collapse with mixed labels
edgelist_mixed <- edgelist %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           pre_cell_type = cell_type,
                           pre_cell_class = cell_class,
                           pre_flow = flow),
           by = c("pre" = "id")) %>%
  left_join(meta %>% select(id = !!sym(dataset_id),
                           post_cell_type = cell_type,
                           post_cell_class = cell_class,
                           post_flow = flow),
           by = c("post" = "id")) %>%
  mutate(
    pre_label = ifelse(pre_flow %in% c("afferent", "efferent") & !is.na(pre_cell_class),
                      pre_cell_class, pre_cell_type),
    post_label = ifelse(post_flow %in% c("afferent", "efferent") & !is.na(post_cell_class),
                       post_cell_class, post_cell_type)
  ) %>%
  filter(!is.na(pre_label), !is.na(post_label))

# Calculate total inputs per post_label for sensory outputs
sensory_post_totals <- edgelist_mixed %>%
  filter(pre_flow == "afferent") %>%
  group_by(post_label) %>%
  summarise(post_total_count = sum(count, na.rm = TRUE), .groups = "drop")

# Sensory outputs (at least 100 synapses)
sensory_outputs <- edgelist_mixed %>%
  filter(pre_flow == "afferent") %>%
  group_by(pre_label, post_label) %>%
  summarise(
    total_count = sum(count, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  left_join(sensory_post_totals, by = "post_label") %>%
  mutate(norm = total_count / post_total_count) %>%
  select(-post_total_count)

# Create matrix for heatmap
sensory_matrix <- sensory_outputs %>%
  group_by(pre_label, post_label) %>%
  summarise(norm = mean(norm, na.rm = TRUE), .groups = "drop") %>%
  tidyr::pivot_wider(
    names_from = post_label,
    values_from = norm,
    values_fill = 0
  ) %>%
  tibble::column_to_rownames("pre_label") %>%
  as.matrix()
sensory_matrix[is.na(sensory_matrix)] <- 0
sensory_matrix[is.infinite(sensory_matrix)] <- 0
sensory_matrix <- sensory_matrix[rowSums(sensory_matrix)>10,]

# Create static heatmap with pheatmap
pheatmap(
  sensory_matrix,
  clustering_method = "ward.D2",
  show_rownames = TRUE,
  show_colnames = FALSE,
  main = "Sensory Neuron Outputs",
  filename = file.path(img_dir, paste0(dataset, "_sensory_outputs.png")),
  width = 10,
  height = 10
)

# Create interactive heatmap with Ward's clustering (no dendrograms shown)
p_sensory <- heatmaply(
  sensory_matrix,
  dendrogram = "none",
  hclust_method = "ward.D2",
  main = "Sensory Neuron Targetting",
  xlab = "Target Neuron Type",
  ylab = "Sensory Neuron Type",
  showticklabels = c(FALSE, TRUE),
  hide_colorbar = FALSE,
  fontsize_row = 6,
  fontsize_col = 6,
  key.title = "Normalized\nWeight"
)

p_sensory

Interpretable!

If your sample does not have sensory or effector neurons, can you think of well-characterized cell types it does contain, to help you further subset and think about your data?

Your Turn: New subset

Now try this analysis yourself with a different dataset!

Exercise: Switch the pre-prepared subset to front_leg

# To work with a different dataset, change the dataset variable at the top:
# subset_name <- "front_leg"
# neuropil_pattern <- "T1"

# Then re-run the entire notebook to see how the results differ!

Extensions

Below are more involved analyses, with longer compute times. Working through these will show you how to cluster neurons by their connection similarity, and investigate the groups.

Connectivity Clusters

There are many different ways to cluster nodes by their connectivity. For example, modularity algorithms like the Louvain algorithm, which is implemented in igraph.

Here, we can take a simple but effective method. First, we will calculate the cosine similarity between all neurons in our edgelist, based on both their input and output connectivity:

# Create connectivity matrix (neurons x partners)
# Combine both pre and post connections for each neuron

# Get all neurons
all_neurons <- union(edgelist$pre, edgelist$post)

# Filter for neurons with sufficient connections
conn_counts <- data.frame(
  id = c(edgelist$pre, edgelist$post)
) %>%
  count(id) %>%
  filter(n >= 10)  # At least 10 connections
neurons_to_use <- intersect(all_neurons, conn_counts$id)

# Create sparse matrix for both inputs and outputs
# Rows = neurons, Cols = all partners (pre or post)
edgelist_filtered <- edgelist %>%
  filter(pre %in% neurons_to_use | post %in% neurons_to_use) %>%
  mutate(norm = ifelse(is.na(norm) | norm == 0, 0.001, norm))  # Avoid zeros

# Prepare for matrix creation
conn_data <- bind_rows(
  edgelist_filtered %>%
    filter(pre %in% neurons_to_use) %>%
    select(neuron = pre, partner = post, weight = norm) %>%
    mutate(type = "output"),
  edgelist_filtered %>%
    filter(post %in% neurons_to_use) %>%
    select(neuron = post, partner = pre, weight = norm) %>%
    mutate(type = "input")
) %>%
  mutate(partner_type = paste(type, partner, sep = "_"))  # Distinguish input vs output

# Create sparse matrix
neuron_factor <- factor(conn_data$neuron, levels = neurons_to_use)
partner_factor <- factor(conn_data$partner_type)
inout_connection_matrix <- sparseMatrix(
  i = as.integer(neuron_factor),
  j = as.integer(partner_factor),
  x = conn_data$weight,
  dims = c(length(neurons_to_use), nlevels(partner_factor)),
  dimnames = list(neurons_to_use, levels(partner_factor))
)

# Remove all-zero rows
non_zero_rows <- which(rowSums(abs(inout_connection_matrix)) > 0.00001)
inout_connection_matrix <- inout_connection_matrix[non_zero_rows, ]

# Remove all-zero columns
non_zero_cols <- which(colSums(abs(inout_connection_matrix)) > 0.00001)
inout_connection_matrix <- inout_connection_matrix[, non_zero_cols]

# Calculate sparsity
sparsity <- sum(inout_connection_matrix == 0) / prod(dim(inout_connection_matrix))

# Calculate cosine similarity
sparse_matrix <- as(as.matrix(t(inout_connection_matrix)), "dgCMatrix")
undirected_cosine_sim_matrix <- cosine_similarity_sparse(sparse_matrix)
undirected_cosine_sim_matrix[is.infinite(undirected_cosine_sim_matrix)] <- 0
cat("Matrix sparsity:", round(sparsity * 100, 2), "%\n")
## Matrix sparsity: 98.69 %
cat("Dimensions:", nrow(undirected_cosine_sim_matrix), "x",
    ncol(undirected_cosine_sim_matrix), "\n")
## Dimensions: 6184 x 6184

We can use these similarity scores to build a UMAP representation:

# Represent as UMAP
set.seed(42)

umap_result <- uwot::umap(
  undirected_cosine_sim_matrix,
  metric = "cosine",
  n_epochs = 500,
  n_neighbors = min(30, nrow(undirected_cosine_sim_matrix) - 1),
  min_dist = 0.1,
  n_trees = 50,
  n_components = 2,
  verbose = FALSE
)

# Create a data frame with UMAP coordinates
umap_df <- data.frame(
  UMAP1 = umap_result[, 1],
  UMAP2 = umap_result[, 2],
  id = rownames(undirected_cosine_sim_matrix)
) %>%
  left_join(
    meta %>% select(
      id = !!sym(dataset_id),
      cell_type, super_class, cell_class, cell_sub_class,
      flow, region, hemilineage
    ),
    by = "id"
  )


# Plot UMAP colored by super_class with hover text showing ID and cell_type
p_umap_super <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = super_class,
                                     text = paste0("ID: ", id, "\nCell Type: ", cell_type))) +
  geom_point(alpha = 0.6, size = 1.5) +
  labs(
    title = "UMAP of Connectivity Patterns",
    subtitle = "Colored by super_class",
    color = "Super Class"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

save_plot(p_umap_super, paste0(dataset, "_umap_super_class"))
ggplotly(p_umap_super, tooltip = "text")

Next, we will perform hierarchical clustering on the UMAP coordinates to identify connectivity-based clusters:

# Perform hierarchical clustering
dist_matrix <- dist(umap_result, method = "euclidean")
hc <- hclust(dist_matrix, method = "ward.D2")

# Dynamic tree cutting for automatic cluster detection
if (require(dynamicTreeCut, quietly = TRUE)) {
  dynamic_clusters <- cutreeDynamic(
    hc,
    distM = as.matrix(dist_matrix),
    deepSplit = 2,
    minClusterSize = max(5, round(nrow(umap_df) * 0.01))
  )
} else {
  # Fallback: cut tree at fixed height
  dynamic_clusters <- cutree(hc, k = min(12, ceiling(nrow(umap_df) / 10)))
}
##  ..cutHeight not given, setting it to 506  ===>  99% of the (truncated) height range in dendro.
##  ..done.
umap_df$unordered_cluster <- factor(dynamic_clusters)

# Calculate centroids of clusters
centroids <- umap_df %>%
  group_by(unordered_cluster) %>%
  summarize(
    UMAP1_centroid = mean(UMAP1),
    UMAP2_centroid = mean(UMAP2),
    size = n()
  )

print(centroids %>% arrange(desc(size)))
## # A tibble: 9 × 4
##   unordered_cluster UMAP1_centroid UMAP2_centroid  size
##   <fct>                      <dbl>          <dbl> <int>
## 1 1                          0.243          0.757  1157
## 2 2                         -3.88           2.14    976
## 3 3                         -7.18           2.65    779
## 4 4                          7.37           1.56    706
## 5 5                          3.28          -0.684   663
## 6 6                          7.05          -2.67    635
## 7 7                         -0.562         -5.37    485
## 8 8                         -7.58          -0.147   481
## 9 9                          3.86          -4.32    302
# Calculate pairwise distances between centroids
dist_centroids <- dist(centroids[, c("UMAP1_centroid", "UMAP2_centroid")],
                      method = "euclidean")

# Order clusters based on hierarchical clustering of centroids
hc_centroids <- hclust(dist_centroids, method = "ward.D2")
dd <- as.dendrogram(hc_centroids)
ordered_cluster <- 1:length(order.dendrogram(dd))
names(ordered_cluster) <- order.dendrogram(dd)

# Map original cluster numbers to new ordered cluster numbers
umap_df$cluster <- ordered_cluster[as.character(umap_df$unordered_cluster)]
umap_df$cluster <- factor(umap_df$cluster, levels = sort(unique(umap_df$cluster)))

# Create color palette
n_clusters <- length(unique(umap_df$cluster))
cluster_colors <- colorRampPalette(c("#E41A1C", "#377EB8", "#4DAF4A",
                                    "#984EA3", "#FF7F00", "#FFFF33"))(n_clusters)
names(cluster_colors) <- sort(unique(umap_df$cluster))

# Calculate cluster centroids for labeling
cluster_centroids <- umap_df %>%
  group_by(cluster) %>%
  summarise(
    UMAP1 = mean(UMAP1),
    UMAP2 = mean(UMAP2),
    n = n()
  )

# Plot UMAP colored by cluster with hover text showing ID and cell_type
p_umap_clusters <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = cluster,
                                        text = paste0("ID: ", id, "\nCell Type: ", cell_type))) +
  geom_point(alpha = 0.7, size = 2) +
  scale_color_manual(values = cluster_colors) +
  geom_text(
    data = cluster_centroids,
    aes(x = UMAP1, y = UMAP2, label = cluster),
    color = "black",
    size = 5,
    fontface = "bold",
    inherit.aes = FALSE
  ) +
  labs(
    title = "Connectivity-Based Clusters",
    subtitle = sprintf("%d clusters identified by hierarchical clustering", n_clusters)
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    legend.position = "none"
  )

save_plot(p_umap_clusters, paste0(dataset, "_umap_clusters"))
ggplotly(p_umap_clusters, tooltip = "text")
cat("Found", length(unique(dynamic_clusters)), "clusters\n")
## Found 9 clusters

Let’s examine what cell types are in each cluster:

# Summarize clusters by cell_type and super_class
cluster_summary <- umap_df %>%
  filter(!is.na(cluster)) %>%
  count(cluster, super_class, cell_type) %>%
  arrange(cluster, desc(n))

# Top cell types per cluster
top_types_per_cluster <- cluster_summary %>%
  group_by(cluster) %>%
  slice_head(n = 3) %>%
  summarise(
    top_types = paste(cell_type, collapse = ", "),
    .groups = "drop"
  )

print(top_types_per_cluster)
## # A tibble: 9 × 2
##   cluster top_types                      
##   <fct>   <chr>                          
## 1 1       SApp10, SApp09, SApp22         
## 2 2       JO-E, JO-B, JO-F               
## 3 3       JO-B, JO-A, JO-E               
## 4 4       CB1379, unknown, ENS3          
## 5 5       LB3b-c, TPMN1, claw_tpGRN      
## 6 6       BM_Taste, CB0806, CB0859       
## 7 7       BM_InOm, BM_vOcci_vPoOr, BM_Ant
## 8 8       CL121,CL122, AN08B099, DNg12_e 
## 9 9       SAch01, AN05B071, AN09B004
# Super class composition
cluster_super <- umap_df %>%
  filter(!is.na(cluster), !is.na(super_class)) %>%
  count(cluster, super_class) %>%
  group_by(cluster) %>%
  mutate(proportion = n / sum(n)) %>%
  ungroup()

# Plot super class composition
p_cluster_comp <- ggplot(cluster_super,
                        aes(x = cluster, y = proportion, fill = super_class)) +
  geom_col() +
  scale_fill_brewer(palette = "Set3") +
  labs(
    title = "Cluster Composition by Super Class",
    x = "Cluster",
    y = "Proportion",
    fill = "Super Class"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 0)
  )

save_plot(p_cluster_comp, paste0(dataset, "_cluster_composition"))
ggplotly(p_cluster_comp)

Sensory and Effector Neuron Focus

Let’s create a special UMAP visualization highlighting sensory and effector neurons:

# Add flow information to UMAP data
umap_df_flow <- umap_df %>%
  mutate(
    is_sensory = super_class == "sensory",
    is_effector = flow == "efferent",  # Use flow column for effector neurons
    display_type = case_when(
      is_sensory ~ "Sensory",
      is_effector ~ "Effector",
      TRUE ~ "Other"
    ),
    point_shape = case_when(
      is_sensory ~ "circle",
      is_effector ~ "square",
      TRUE ~ "circle"
    ),
    display_label = case_when(
      is_sensory ~ cell_sub_class,
      is_effector ~ cell_class,
      TRUE ~ "Other"
    )
  )

# Create plot with hover text showing ID and cell_type
p_umap_sensory_effector <- ggplot(umap_df_flow, aes(x = UMAP1, y = UMAP2)) +
  # Plot "Other" neurons in grey first
  geom_point(
    data = umap_df_flow %>% filter(display_type == "Other"),
    aes(color = display_type, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
    alpha = 0.3,
    size = 1.5,
    shape = 16
  ) +
  # Plot sensory neurons (circles) colored by cell_sub_class
  geom_point(
    data = umap_df_flow %>% filter(is_sensory),
    aes(color = display_label, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
    alpha = 0.8,
    size = 2.5,
    shape = 16  # circle
  ) +
  # Plot effector neurons (squares) colored by cell_class
  geom_point(
    data = umap_df_flow %>% filter(is_effector),
    aes(color = display_label, text = paste0("ID: ", id, "\nCell Type: ", cell_type)),
    alpha = 0.8,
    size = 2.5,
    shape = 15  # square
  ) +
  scale_color_manual(
    values = c(
      "Other" = "grey70",
      setNames(
        rainbow(length(unique(c(
          umap_df_flow$cell_sub_class[umap_df_flow$is_sensory],
          umap_df_flow$cell_class[umap_df_flow$is_effector]
        )))),
        unique(c(
          umap_df_flow$cell_sub_class[umap_df_flow$is_sensory],
          umap_df_flow$cell_class[umap_df_flow$is_effector]
        ))
      )
    ),
    name = "Cell Sub-Class"
  ) +
  labs(
    title = "UMAP: Sensory and Effector Neurons",
    subtitle = "Sensory = circles, Effector = squares, colored by cell sub-class",
    x = "UMAP 1",
    y = "UMAP 2"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

save_plot(p_umap_sensory_effector, paste0(dataset, "_umap_sensory_effector"))
ggplotly(p_umap_sensory_effector, tooltip = "text")

Cluster Network with Sensory-Effector Nodes

Finally, let’s create a summary network where each node represents either: - A connectivity-based cluster (for internal neurons) - Individual sensory cell classes (grey nodes) - Individual effector cell classes (black nodes)

# Create node assignments: cluster for interneurons, cell_class for sensory/effector
umap_df_annotated <- umap_df %>%
  mutate(
    node_label = case_when(
      flow == "afferent" & !is.na(cell_class) ~ paste0("Sensory: ", cell_class),
      flow == "efferent" & !is.na(cell_class) ~ paste0("Effector: ", cell_class),
      TRUE ~ paste0("Cluster ", cluster)
    ),
    node_type = case_when(
      flow == "afferent" ~ "sensory",
      flow == "efferent" ~ "effector",
      TRUE ~ "cluster"
    )
  )

# Create edgelist with node labels (also track cluster for coloring)
edgelist_cluster_network <- edgelist %>%
  left_join(
    umap_df_annotated %>% select(id, pre_node_label = node_label, pre_node_type = node_type, pre_cluster = cluster),
    by = c("pre" = "id")
  ) %>%
  left_join(
    umap_df_annotated %>% select(id, post_node_label = node_label, post_node_type = node_type, post_cluster = cluster),
    by = c("post" = "id")
  ) %>%
  filter(!is.na(pre_node_label), !is.na(post_node_label),
         pre_node_label != post_node_label) %>%
  group_by(pre_node_label, post_node_label, pre_node_type, post_node_type, pre_cluster) %>%
  summarise(
    synapse_count = sum(count, na.rm = TRUE),
    weight = sum(count, na.rm = TRUE),
    n_connections = n(),
    .groups = "drop"
  ) %>%
  filter(synapse_count >= 100)  # Keep substantial connections

# Create vertices with node types and cluster info for coloring
vertices_cluster <- umap_df_annotated %>%
  group_by(node_label, node_type, cluster) %>%
  summarise(size = n(), .groups = "drop") %>%
  rename(name = node_label) %>%
  filter(name %in% c(edgelist_cluster_network$pre_node_label,
                    edgelist_cluster_network$post_node_label)) %>%
  distinct(name, .keep_all = TRUE)

# Create graph
g_cluster <- graph_from_data_frame(
  d = edgelist_cluster_network %>% select(pre_node_label, post_node_label,
                                          weight, synapse_count),
  vertices = vertices_cluster,
  directed = TRUE
)

# Convert to tidygraph
g_cluster_tidy <- as_tbl_graph(g_cluster) %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "total"))

# Create layout with increased repulsion (larger area for better spacing)
layout_cluster <- create_layout(g_cluster_tidy, layout = "fr", niter = 1000,
                                area = vcount(g_cluster_tidy)^2.5)

# Create node color mapping: use cluster colors for clusters, fixed colors for sensory/effector
layout_cluster <- layout_cluster %>%
  mutate(
    node_color = case_when(
      node_type == "sensory" ~ "grey60",
      node_type == "effector" ~ "black",
      node_type == "cluster" & !is.na(cluster) ~ cluster_colors[as.character(cluster)],
      TRUE ~ "steelblue"
    )
  )

# Plot
p_cluster_network <- ggraph(layout_cluster) +
  geom_edge_arc(
    aes(width = weight, alpha = weight),
    arrow = arrow(length = unit(2, 'mm'), type = "closed"),
    start_cap = circle(4, 'mm'),
    end_cap = circle(4, 'mm'),
    strength = 0.3,
    color = "grey60"
  ) +
  geom_node_point(
    aes(size = size, color = node_color),
    alpha = 0.8
  ) +
  geom_node_text(
    aes(label = name),
    repel = TRUE,
    size = 2.5,
    fontface = "bold"
  ) +
  scale_edge_width(range = c(0.2, 1.5), name = "Synapse\nCount") +
  scale_edge_alpha(range = c(0.5, 1.0)) +
  scale_size_continuous(range = c(3, 12), name = "Neuron\nCount") +
  scale_color_identity() +
  labs(
    title = "Cluster Network with Sensory-Effector Nodes",
    subtitle = "Nodes = connectivity clusters + sensory/effector cell classes"
  ) +
  theme_graph() +
  theme(
    legend.position = "right",
    plot.title = element_text(face = "bold", size = 14)
  )

save_plot(p_cluster_network, paste0(dataset, "_cluster_network"), width = 14, height = 12)
p_cluster_network

cat("Network nodes:", vcount(g_cluster), "\n")
## Network nodes: 31
cat("Network edges:", ecount(g_cluster), "\n")
## Network edges: 213
cat("Sensory nodes:", sum(vertices_cluster$node_type == "sensory", na.rm = TRUE), "\n")
## Sensory nodes: 11
cat("Effector nodes:", sum(vertices_cluster$node_type == "effector", na.rm = TRUE), "\n")
## Effector nodes: 11
cat("Cluster nodes:", sum(vertices_cluster$node_type == "cluster", na.rm = TRUE), "\n")
## Cluster nodes: 9

Visualise cluster morphologies

Remembering what we learned from tutorial 02, we can read .swc files by cluster, and visualise the morphologies of each cluster, to give us a sense of what we are dealing with.

# Sample a few neurons per cluster for visualization
set.seed(42)
neurons_per_cluster <- 10
max_clusters_to_viz <- min(8, n_clusters)  # Visualize up to 8 clusters

# Get sample of neurons for each cluster
# Note: slice_sample will use all available neurons if n exceeds group size
cluster_samples <- umap_df %>%
  filter(cluster %in% 1:max_clusters_to_viz) %>%
  group_by(cluster) %>%
  slice_sample(n = neurons_per_cluster) %>%
  ungroup()

# Download skeletons
swc_path <- file.path(data_path, dataset_base, "obj", "skeletons")

# Create temporary directory for SWC files
temp_swc_dir <- tempdir()

# Read neurons for each cluster
neurons_by_cluster <- list()
for (cl in 1:max_clusters_to_viz) {
  cluster_ids <- cluster_samples %>%
    filter(cluster == cl) %>%
    pull(id)
  neurons_list <- list()
  for (nid in cluster_ids) {
    neuron <- read_swc_neuron(nid, swc_path, temp_swc_dir)
    if (!is.null(neuron)) {
      neurons_list[[as.character(nid)]] <- neuron
    }
  }
  if (length(neurons_list) > 0) {
    neurons_by_cluster[[cl]] <- as.neuronlist(neurons_list)
  }
}

Now let’s visualize each cluster’s morphologies:

if (1 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[1]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[1]], col = cluster_colors[1], soma = 10000)
}
if (2 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[2]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[2]], col = cluster_colors[2], soma = 10000)
}
if (3 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[3]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[3]], col = cluster_colors[3], soma = 10000)
}
if (4 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[4]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[4]], col = cluster_colors[4], soma = 10000)
}
if (5 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[5]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[5]], col = cluster_colors[5], soma = 10000)
}
if (6 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[6]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[6]], col = cluster_colors[6], soma = 10000)
}
if (7 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[7]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[7]], col = cluster_colors[7], soma = 10000)
}
if (8 %in% names(neurons_by_cluster) && length(neurons_by_cluster[[8]]) > 0) {
  nclear3d()
  dummy <- plot3d(brain_mesh, col = "lightgrey", alpha = 0.1)
  plot3d(neurons_by_cluster[[8]], col = cluster_colors[8], soma = 10000)
}

Summary

In this tutorial, we covered:

  1. Loading connectivity data - Working with edgelists and meta data
  2. Basic network statistics - Degree distributions, correlations, small-world metrics
  3. Network visualization - Graph plots and connectivity matrices
  4. Connectivity matrices - Heatmaps of cell type connectivity
  5. Sensory-effector analysis - Interpretable input-output patterns
  6. Connectivity-based clustering - Using cosine similarity and UMAP to identify functional groups

Key Takeaways

  • Edgelists describe directed, weighted graphs of neural connectivity
  • Cosine similarity is effective for comparing connectivity patterns
  • UMAP reveals structure in high-dimensional connectivity data
  • Sensory and effector neurons provide interpretable anchors for analysis
  • Hierarchical clustering can identify functional groups based on connectivity

Session Information

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] influencer_0.1.0         remotes_2.5.0            doSNOW_1.0.20           
##  [4] snow_0.4-4               iterators_1.0.14         foreach_1.5.2           
##  [7] readobj_0.4.1            dynamicTreeCut_1.63-1    lsa_0.73.3              
## [10] SnowballC_0.7.0          tidygraph_1.2.3          ggraph_2.2.1            
## [13] igraph_1.5.1             htmlwidgets_1.6.4        uwot_0.1.14             
## [16] Matrix_1.6-1.1           umap_0.2.10.0            pheatmap_1.0.12         
## [19] heatmaply_1.4.0          viridis_0.6.5            viridisLite_0.4.2       
## [22] ggdendro_0.1.23          duckdb_0.9.2-1           DBI_1.2.3               
## [25] plotly_4.11.0            nat.flybrains_1.8.2      nat.templatebrains_1.2.1
## [28] nat.nblast_1.6.7         nat_1.11.0               rgl_1.2.8               
## [31] patchwork_1.1.3          forcats_0.5.2            stringr_1.6.0           
## [34] dplyr_1.1.4              purrr_1.1.0              readr_2.1.5             
## [37] tidyr_1.3.1              tibble_3.3.0             ggplot2_4.0.0.9000      
## [40] tidyverse_1.3.2          arrow_16.1.0            
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.4.1        backports_1.5.0     spam_2.10-0        
##   [4] systemfonts_1.2.3   plyr_1.8.9          lazyeval_0.2.2     
##   [7] splines_4.2.1       crosstalk_1.2.0     digest_0.6.37      
##  [10] ca_0.71.1           htmltools_0.5.8.1   magrittr_2.0.4     
##  [13] memoise_2.0.1       googlesheets4_1.1.1 tzdb_0.4.0         
##  [16] graphlayouts_1.1.1  modelr_0.1.11       extrafont_0.18     
##  [19] extrafontdb_1.0     askpass_1.2.1       blob_1.2.4         
##  [22] rvest_1.0.3         rappdirs_0.3.3      ggrepel_0.9.5      
##  [25] textshaping_0.3.6   haven_2.5.1         xfun_0.54          
##  [28] crayon_1.5.3        jsonlite_2.0.0      glue_1.8.0         
##  [31] polyclip_1.10-4     registry_0.5-1      gtable_0.3.6       
##  [34] gargle_1.6.0        webshot_0.5.5       Rttf2pt1_1.3.11    
##  [37] scales_1.4.0        Rcpp_1.0.11         reticulate_1.34.0  
##  [40] bit_4.6.0           dotCall64_1.1-1     httr_1.4.7         
##  [43] RColorBrewer_1.1-3  nabor_0.5.0         pkgconfig_2.0.3    
##  [46] farver_2.1.2        sass_0.4.8          dbplyr_2.2.1       
##  [49] utf8_1.2.6          here_1.0.2          reshape2_1.4.4     
##  [52] labeling_0.4.3      tidyselect_1.2.1    rlang_1.1.6        
##  [55] cellranger_1.1.0    tools_4.2.1         cachem_1.1.0       
##  [58] cli_3.6.5           generics_0.1.4      RSQLite_2.3.4      
##  [61] broom_1.0.6         evaluate_1.0.5      fastmap_1.2.0      
##  [64] ragg_1.2.4          yaml_2.3.10         knitr_1.50         
##  [67] bit64_4.6.0-1       fs_1.6.3            filehash_2.4-6     
##  [70] dendroextras_0.2.3  nat.utils_0.6.1     dendextend_1.17.1  
##  [73] nlme_3.1-160        xml2_1.3.6          compiler_4.2.1     
##  [76] rstudioapi_0.17.1   png_0.1-8           reprex_2.0.2       
##  [79] tweenr_2.0.2        bslib_0.6.1         stringi_1.8.3      
##  [82] RSpectra_0.16-1     lattice_0.20-45     vctrs_0.6.5        
##  [85] pillar_1.11.1       lifecycle_1.0.4     jquerylib_0.1.4    
##  [88] RcppAnnoy_0.0.20    data.table_1.16.2   seriation_1.4.0    
##  [91] R6_2.6.1            TSP_1.2-1           gridExtra_2.3      
##  [94] codetools_0.2-18    dichromat_2.0-0.1   MASS_7.3-58.1      
##  [97] assertthat_0.2.1    rprojroot_2.1.1     openssl_2.3.4      
## [100] withr_3.0.2         mgcv_1.8-41         parallel_4.2.1     
## [103] hms_1.1.3           grid_4.2.1          rmarkdown_2.30     
## [106] S7_0.2.0            googledrive_2.1.1   ggforce_0.4.1      
## [109] lubridate_1.8.0     base64enc_0.1-3